Diagnostic accuracy of symptoms for an underlying disease: a simulation study

Symptoms have been used to diagnose conditions such as frailty and mental illnesses. However, the diagnostic accuracy of the numbers of symptoms has not been well studied. This study aims to use equations and simulations to demonstrate how the factors that determine symptom incidence influence symptoms’ diagnostic accuracy for disease diagnosis. Assuming a disease causing symptoms and correlated with the other disease in 10,000 simulated subjects, 40 symptoms occurred based on 3 epidemiological measures: proportions diseased, baseline symptom incidence (among those not diseased), and risk ratios. Symptoms occurred with similar correlation coefficients. The sensitivities and specificities of single symptoms for disease diagnosis were exhibited as equations using the three epidemiological measures and approximated using linear regression in simulated populations. The areas under curves (AUCs) of the receiver operating characteristic (ROC) curves was the measure to determine the diagnostic accuracy of multiple symptoms, derived by using 2 to 40 symptoms for disease diagnosis. With respect to each AUC, the best set of sensitivity and specificity, whose difference with 1 in the absolute value was maximal, was chosen. The results showed sensitivities and specificities of single symptoms for disease diagnosis were fully explained with the three epidemiological measures in simulated subjects. The AUCs increased or decreased with more symptoms used for disease diagnosis, when the risk ratios were greater or less than 1, respectively. Based on the AUCs, with risk ratios were similar to 1, symptoms did not provide diagnostic values. When risk ratios were greater or less than 1, maximal or minimal AUCs usually could be reached with less than 30 symptoms. The maximal AUCs and their best sets of sensitivities and specificities could be well approximated with the three epidemiological and interaction terms, adjusted R-squared ≥ 0.69. However, the observed overall symptom correlations, overall symptom incidence, and numbers of symptoms explained a small fraction of the AUC variances, adjusted R-squared ≤ 0.03. In conclusion, the sensitivities and specificities of single symptoms for disease diagnosis can be explained fully by the at-risk incidence and the 1 minus baseline incidence, respectively. The epidemiological measures and baseline symptom correlations can explain large fractions of the variances of the maximal AUCs and the best sets of sensitivities and specificities. These findings are important for researchers who want to assess the diagnostic accuracy of composite diagnostic criteria.

www.nature.com/scientificreports/ In a simplistic hypothetical case (see Table 1), a proportion of a population is affected by a disease, denoted by d, and a constant and random rate of incidence occurs for one symptom at a particular time point, denoted by ir. For those affected by the disease, the incidence rate of the symptoms increases by a risk ratio, denoted by rr. With respect to the individuals diseased, the proportion of those presenting the symptom was d × ir × rr, and the proportion of those not presenting the symptom was d × (1 − ir × rr). Regarding those not diseased, the proportion of those presenting the symptom was (1 − d) × ir and the proportion of those not presenting the symptom was (1 − d) × (1 − ir). The sensitivity 11 of this symptom for detecting the disease equaled (d × ir × rr)/d = ir × rr, and the specificity was [(1 − d) × (1 − ir)]/(1 − d) = 1 − ir. Based on these calculations, the symptom incidence (and the risk ratio) should connect to the diagnostic test accuracy of the symptoms for the disease.
In Table 2, a two-symptom case is hypothesized. Two symptoms are associated with the occurrence of the disease. There are incidence rates and risk ratios for presenting both symptoms, one, or none. The correlations between the symptoms can influence the co-occurrence of multiple symptoms and thus joint incidence (ir both ) and joint risk ratios (rr both ) 2 . The sensitivity and specificity 11 of presenting two symptoms for the detection of Table 1. Diagnostic accuracy of a single symptom to diagnose the disease cause in equations. Equations are derived based on Baratloo et al. 's definitions 11 . d proportions in a population with the disease, ir symptom incidence rate, rr risk ratios.    11 . both both symptoms presenting due to the disease that caused the symptoms, d proportions in a population with the disease, ir symptom incidence rate, one one symptom presenting, rr risk ratios.
Simulation procedures. The R codes to simulate individuals with assumed epidemiological measures are in Appendix 1. For each simulation, we chose a combination of the above-mentioned epidemiological measures, including disease incidence, associations between diseases, and symptom risk ratios. In a simulation, we created 10,000 individuals and randomly assigned them disease statuses based on the assumed proportions diseased.
We also randomly assigned the other associated disease based on its correlations with the main disease using an established method 18,19 . The probability of individuals developing symptoms differed by whether they were diseased or not. Among those diseased, the probability of developing a symptom was the product of its baseline incidence and an assumed risk ratio. Among those not diseased, the probability of developing a symptom was based on the baseline incidence of the symptom, and we created 40 symptoms at the same time. We consid- Table 3. Assumptions and the assessments of the simulated symptoms. AUC area under curve. : one disease directly related to the symptoms and the other  associated with the disease only (unrelated to symptoms)   2  Similar baseline incidence rates among those not diseased and similar risk  ratios for the symptoms   3  Accurate disease statuses; symptoms reported accurately by patients   4 The products of baseline incidence rates and risk ratios less than or equal to 1 www.nature.com/scientificreports/ ered the correlations between the 40 symptoms, and we randomly assigned the symptoms based on disease statuses 18,19 .

2 diseases of interest
Diagnostic test accuracy of symptoms. We first described the diagnostic accuracy of the symptoms to detect the disease and the other associated disease using equations. Then we used the data obtained from simulations to validate the equations. We defined sensitivity as the number of true cases identified by a symptom or symptoms (more than the numbers required by a threshold) divided by the number of those diseased 11,20 . We defined specificity as the number of non-cases identified by the absence of a symptom or symptoms (using the same threshold as the sensitivity) divided by the number of those not diseased. The areas under the receiver operating characteristic (ROC) curves and the 95% CIs were derived when using more than one symptom to detect the disease status 21 . We compared the area under curves (AUCs) that were derived from using different numbers of symptoms for disease diagnosis 21 . We chose the best set of sensitivity and specificity in a ROC curve by searching the set with the maximal difference between 1 and the sum of sensitivity and specificity in absolute values 22 . We reported the number of symptoms and the sensitivity and specificity of the best set.
Approximation of diagnostic accuracy and symptom correlations. We approximated the correlations between symptoms and diagnostic accuracy, including the sensitivities and specificities of single symptoms for disease diagnosis in simulated populations, with epidemiological measures using linear regression. Using linear regression models to approximate complicated measures has been proven to be an effective method to understand the role or importance of various factors on these measures. We considered correlations between symptoms or diagnostic accuracy to be a dependent variable ( Y ), and approximated them by using the above-mentioned epidemiological measures with or without their interaction terms (denoted as x i , i ranging from 1 to the total number of independent variables in a regression model). The equation was Y = α 0 + α 1 × x 1 + α 2 × x 2 + · · · + α n × x n , where α 0 denoted the intercept, α i denoted the regression coefficients, and n was the number of independent variables. The implementation of the regression models is available in the R codes in Appendix 1. We used this approach to interpret principal components [23][24][25] , determine life stages 26,27 , interpret the diagnosis of frailty syndrome 1 , and demonstrate the biases generated by the diagnostic criteria of mental illnesses 2 . We conduced all the statistical analyses using the R environment (v3.5.1, R Foundation for Statistical Computing, Vienna, Austria) 28 and RStudio (v1.1.463, RStudio, Inc., Boston, MA) 29 .

Results
Quality of simulations and symptom incidence. The derived baseline incidence rates of single symptoms matched the assumed incidence rates, regardless of the assumed proportions diseased, assumed risk ratios, and assumed correlations between symptoms (Appendix 2). The derived risk ratios matched the assumed risk ratios when the at-risk incidence (incidence among those diseased) was less than 1 (Appendix 2). Figure 2 presents the symptom incidence among all subjects. The overall symptom incidence depended on the proportions diseased, baseline symptom incidence, and symptom risk ratios. Based on the similarities between the assumed and derived values, the simulations were well implemented.
Correlations between symptoms. The correlations between the symptoms ranged from − 0.02 to 0.99 in all simulations (see Table 4). The effects of the assumed epidemiological measures on the correlations between symptoms in the linear regression models depended on whether the at-risk incidence reached 1. The correlations between the two diseases, one causing symptoms and the other only associated with the disease, were not Correlation Disease Associated disease www.nature.com/scientificreports/ significantly associated with the correlations between the symptoms. The at-risk incidence reaching 1 or not, proportions diseased, risk ratios, at-risk incidence, and symptom correlations among those not diseased (baseline symptom correlations) were significantly and positively associated with the overall symptom correlations. The baseline incidence was negatively and significantly associated with the overall symptom correlations. The adjusted R-squared was 0.86 and 0.89 with at-risk incidence reaching 1 or not, respectively.
Diagnostic test accuracy of individual symptoms for the detection of the diseases. As expected in Table 1, the sensitivities and specificities of individual symptoms for disease diagnosis can be predicted with at-risk incidence ( Table 5) and 1 minus baseline incidence (Table 6), respectively. The sensitivities of individual symptoms for disease diagnosis were 1 for all symptoms, when the at-risk incidence reached 1. The effect sizes of disease correlations, proportions diseased, risk ratios, and baseline symptom correlations remained the same, when the at-risk incidence reached 1 or not.  www.nature.com/scientificreports/ Diagnostic test accuracy of symptom numbers for disease diagnosis. When using the accumulative numbers of symptoms to predict the disease directly causing the symptoms, we used the AUCs to compare the diagnostic test accuracy across the numbers of symptoms used. Figure 3 shows one example of the ROC curve assuming the risk ratio as 2, baseline symptom incidence as 0.1, proportions diseased as 0.05, no correlations between diseases, and no correlations between symptoms. When more symptoms were used for disease diagnosis, the AUCs increased. We selected the best set of sensitivities and specificities for disease diagnosis based on the sums of sensitivities and specificities (red dots in Fig. 3). The red dots also represent the diagnostic Table 5. Effects of baseline incidence, proportions diseased, risk ratios, and baseline symptom correlations on the sensitivities of individual symptoms for disease diagnosis. CI confidence interval, SD standard deviation.

Coefficients (95% CIs) p
Incidence not reaching 1 among those diseased www.nature.com/scientificreports/ thresholds for disease diagnosis. For example, when using 40 symptoms for disease diagnosis, the threshold of obtaining the best set of sensitivities and specificities was 5.5 (see the red dots in Fig. 3). This result suggested that when there were 6 or more symptoms out of 40 presenting in individual patients, the sensitivity to detect the disease cause was 86.3%. The specificity for correctly excluding the disease in individuals with less than 6 symptoms out of 40 was 79.4%. For other combinations of epidemiologic measures, see examples in Appendix 3. Table 7 shows the effects of epidemiological measures on the AUCs of individual symptoms for disease diagnosis. The AUCs of individual symptoms can be explained fully by disease correlations, proportions diseased, baseline symptom incidence, risk ratios, at-risk symptom incidence, and symptom correlations (adjusted R-squared = 1 for at-risk incidence reaching 1 or not). When the at-risk incidence was less than 1, the at-risk incidence and baseline incidence had the same effect sizes of opposite directions, a regression coefficient of 0.5 and − 0.5, respectively. When the at-risk incidence reached 1, the AUCs of individual symptoms decreased with the baseline symptom incidence (regression coefficient = − 0.5) from 1 (perfect diagnostic accuracy).
In Table 8, using a maximum of 40 symptoms for disease diagnosis, we analyzed the effects of the epidemiological measures on the observed maximal AUCs. The effect sizes and statistical significance of the epidemiologic measures depended on whether the at-risk incidence reached 1. The correlations between diseases were not significant (p > 0.68 for both). Proportions diseased were significantly and positively associated with the maximal AUCs (p < 0.05 for all). Baseline symptom incidence and symptom correlations were significantly and negatively associated with the maximal AUCs (p < 0.05 for all). The maximal AUCs can be well predicted by epidemiologic measures when the at-risk incidence reached 1 or not (adjusted R-squared = 0.83 and 0.80, respectively). Figure 4 presents the changes in the AUCs according to the numbers of symptoms used for disease diagnosis using simulations assuming 0.8 correlations between symptoms among those not diseased. We colored the AUCs based on the observed risk ratios and baseline symptom incidence. In each simulation, when the 95% CIs of the AUCs overlapped those of the maximal or minimal AUCs with risk ratios greater or less than 1, respectively, we colored the dots gray. The AUCs changed when we used more symptoms for disease diagnosis. In Fig. 4, the 95% CIs of all of the AUCs in the simulations assuming risk ratios as 1 overlapped with the 95% CIs of the maximal or minimal AUCs. The AUCs in the simulations assuming 0 and 0.4 symptom correlations among individuals not diseased are presented in Appendix 4.
The best sets of sensitivities and specificities for disease diagnosis chosen based on the AUCs are plotted in Figs. 5 and 6, respectively. We plotted the sensitivities and specificities according to the assumed risk ratios and baseline symptom incidence. When the 95% CIs of the AUCs overlapped with the 95% CIs of the maximal AUCs, www.nature.com/scientificreports/ we colored the dots gray. The role of the epidemiologic measures in the best sets of sensitivities and specificities are listed in Tables 9 and 10, respectively. In Table 9, the best sets of sensitivities chosen based on the maximal or minimal AUCs-when the risk ratios were greater or less than 1, respectively-were approximated with epidemiological measures. When the at-risk incidence reached 1, the sensitivities were 1, and the epidemiological measures were not significantly associated Table 7. Effects of baseline incidence, proportions diseased, risk ratios, and baseline symptom correlations on the area under the receiver operating characteristic curve of individual symptoms for disease diagnosis. CI confidence interval, SD standard deviation.

Coefficients (95% CIs) p
Incidence not reaching 1 among those diseased  www.nature.com/scientificreports/ with the sensitivities. The correlations between diseases and proportions diseased were not significant when the at-risk incidence did not reach 1 (p > 0.05 for both). The baseline symptom incidence, risk ratios, and baseline symptom correlations were negatively associated with the best-set sensitivities when the at-risk incidence was less than 1 (p < 0.0001 for all). The variances of the best-set sensitivities can be explained mostly by epidemiological measures when the at-risk incidence was less than 1 (adjusted R-squared = 0.72).
In Table 10, the best sets of specificities chosen based on maximal or minimal AUCs-when the risk ratios were greater or less than 1, respectively-were approximated with epidemiological measures. The correlations between diseases and proportions diseased were not significantly associated with the best-set specificities when the at-risk incidence reached 1 or not (p > 0.05 for all). When the at-risk incidence was less than 1, the at-risk incidence was positively and significantly associated with specificities (p < 0.05). The baseline symptom incidence, risk ratios, and baseline symptom correlations were negatively and significantly associated with specificities, and the effect sizes depended on whether the at-risk incidence reached 1 (p < 0.0001 for all). The adjusted R-squared was 0.71 and 0.69 when the at-risk incidence reached 1 or not, respectively.
Diagnostic accuracy for the disease associated with the disease causing symptoms. The diagnostic accuracy for the disease associated with the disease causing symptoms were approximated with the epidemiological measures shown in Table 11. When the at-risk incidence was less than 1, the correlations between . Areas under the receiver operating characteristic curves for disease diagnosis by numbers of symptoms, baseline symptom incidence, and symptom risk ratios. AUC area under curve, CI confidence interval, RR risk ratio, incidence baseline symptom incidence among those not diseased. Gray dots are the area under curve (AUCs) whose 95% confidence intervals (CIs) overlapped with the maximal AUC 95% CIs identified using a maximum of 40 symptoms for disease diagnosis. The lines were added to show the AUCs assuming the same epidemiological measures. All AUCs assuming 0.8 correlations between symptoms among those not diseased are illustrated. www.nature.com/scientificreports/ the diseases and their interaction terms with baseline symptom incidence, risk ratios, and baseline symptom correlations were significantly associated with the AUCs to predict the associated disease (p < 0.0001 for all). The main effects of baseline symptom incidence, risk ratios, and at-risk incidence also were significant (p < 0.0001 for all). When the at-risk incidence reached 1, the correlations between the diseases and their interaction terms with the baseline symptom incidence, risk ratios, and baseline symptom correlations remained significantly associated with the AUCs to predict the associated disease (p < 0.0001 for all). The proportions of the AUC variances explained by the epidemiological measures depended on whether the at-risk incidence reached 1 or not, adjusted R-squared = 0.96 and 0.66, respectively.
Observed symptom correlations and incidence on the AUCs. In Table 12, the AUCs to predict the disease directly causing symptoms were approximated with observable measures: overall symptom correlations, overall symptom incidence, and numbers of symptoms used for disease diagnosis. The overall symptom correlations and numbers of symptoms were positively and significantly associated with AUCs for disease diagnosis (coefficients = 0.145 and 0.001, respectively; p < 0.0001 for both). The overall symptom incidence was negatively and significantly associated with AUCs (coefficient = − 0.033, p < 0.0001). However, these three measures only explained a small fraction of the AUC variances for all risk ratios or when the risk ratios were greater than 1, adjusted R-squared = 0.03 and 0.02, respectively. . Sensitivities for disease diagnosis by numbers of symptoms, baseline symptom incidence, and symptom risk ratios. AUC area under curve, CI confidence interval, RR risk ratio, incidence baseline symptom incidence among those not diseased. Gray dots are the area under curve (AUCs) whose 95% confidence intervals (CIs) overlapped with the maximal AUC 95% CIs identified using a maximum of 40 symptoms for disease diagnosis. The lines were added to show the AUCs assuming the same epidemiological measures. All AUCs assuming 0.8 correlations between symptoms among those not diseased are illustrated.

Discussions
This is the first study to estimate the diagnostic accuracy of single symptoms and the numbers of symptoms, based on simulations that have been used to demonstrate the biases in the diagnostic criteria of mental illnesses 2 . When single symptoms are caused by a common disease and used to predict disease status, the sensitivities and specificities of single symptoms can be predicted fully with the at-risk incidence and 1 minus baseline symptom incidence, respectively. This can be proved by mathematical equations or observed in simulations. However, when two or more symptoms of the same disease cause are used to estimate disease status, the estimates of the joint incidence rates, joint risk ratios, and joint at-risk incidence are required in the equations describing these multiple symptoms. Therefore, it becomes complicated to derive diagnostic accuracy in mathematical equations, and so it is practical to estimate the diagnostic accuracy of multiple symptoms through simulations. Key epidemiological measures for symptom development were identified in the equations: proportions diseased, baseline symptom incidence, and risk ratios of symptom development. The correlations between symptoms are important when more than one symptom are used for disease diagnosis. A combination of these epidemiological measures of the symptoms can be used to simulate symptom development according to disease status. When at most two symptoms occur in a population, the diagnostic accuracy-sensitivities and specificities-of having 0, 1, and 2 symptoms can be derived to construct a ROC and its AUC. By repeating this process until 40 symptoms are used, the AUCs increase or decrease or remain around 0.5 when risk ratios are greater than 1, less than 1, or equals Figure 6. Specificities for disease diagnosis by numbers of symptoms, baseline symptom incidence, and symptom risk ratios. AUC area under curve, CI confidence interval, RR risk ratio, incidence baseline symptom incidence among those not diseased. Gray dots are the area under curve (AUCs) whose 95% confidence intervals (CIs) overlapped with the maximal AUC 95% CIs identified using a maximum of 40 symptoms for disease diagnosis. The lines were added to show the AUCs assuming the same epidemiological measures. All AUCs assuming 0.8 correlations between symptoms among those not diseased are illustrated. www.nature.com/scientificreports/ 1, respectively. For a combination of the epidemiological measures, the maximal AUCs can be selected from the simulations. We selected the best sets of sensitivities and specificities whose absolute values had the largest differences between their sums and 1, for a given AUC. The trade-off between sensitivities and specificities can be observed 1 , when more symptoms are used for disease diagnosis. For a combination of epidemiological measures, AUCs tend to reach the plateau with less than 30 symptoms, particularly when baseline symptom correlations are closer to 0, i.e., symptoms are not statistically correlated. Table 9. Effects of baseline incidence, proportions diseased, risk ratios, and baseline symptom correlations on the sensitivities obtained from the maximal area under the receiver operating characteristic curve using at most 40 symptoms for disease diagnosis. CI confidence interval, SD standard deviation.  Adjusted R-square = 0.5 Table 10. Effects of baseline incidence, proportions diseased, risk ratios, and baseline symptom correlations on the specificities obtained from the maximal area under the receiver operating characteristic curve using at most 40 symptoms for disease diagnosis. CI confidence interval, SD standard deviation.

Coefficients (95% CIs) p
Incidence not reaching 1 among those diseased  www.nature.com/scientificreports/ The maximal AUCs can be well approximated with baseline incidence, risk ratios, at-risk incidence, and baseline symptom correlations (adjusted R-squared > 0.71). The best sets of sensitivities and specificities also can be well approximated with these measures (adjusted R-squared > 0.69). However, in the real world, symptom incidence and risk ratios cannot be determined when the disease status cannot be precisely confirmed. We found that the three observable measures-overall symptom correlations, overall symptom incidence, and numbers of Table 11. Effects of the correlations between diseases, baseline incidence, proportions diseased, risk ratios, and baseline symptom correlations on the areas under curves obtained from the maximal area under the receiver operating characteristic curve using at most 40 symptoms to predict the disease associated with the disease that caused symptoms. CI confidence interval, SD standard deviation.

Coefficients (95% CIs) p
Incidence not reaching 1 among those diseased  Table 12. Role of numbers of symptoms, overall symptom correlations, and overall symptom incidence on the AUCs for disease diagnosis. CI confidence interval.

Coefficients (95% CIs) p
All RRs www.nature.com/scientificreports/ symptoms-do not well explain the AUC variances (adjusted R-squared = 0.03). When researchers are confident that the RRs are greater than 1 (AUCs increase with the numbers of symptoms), the observable measures explain the AUC variances even worse (adjusted R-squared = 0.02).

Evidence-based recommendations?
A previous study has provided several recommendations for how to use age-related symptoms to diagnose a geriatric syndrome, frailty 7 . The first recommendation for using symptoms for frailty diagnosis was to explicitly select these symptoms based on their associations with health status 7 . The authors did not provide recommendations about selecting symptoms directly associated with frailty 7 . The second recommendation was to choose symptoms that become more prevalent with age 7 . The third recommendation was to choose symptoms that do not saturate early in the life stage (do not become very prevalent among the elderly) 7 . The fourth recommendation was to include symptoms developed from different systems 7 , for example, not to include only symptoms related to changes in cognition 7 . The last recommendation was to use the same frailty indices consisting of the same symptoms, when the indices are used in the same populations in different time points 7 . The authors thought different frailty indices often yield similar results in the same samples 7. One additional recommendation was to use at least 30 to 40 symptoms to create frailty indices, since they claimed that using more symptoms leads to more precise estimates 7 .
No scientific evidence exists to support the first three above-mentioned recommendations 7 . In fact, these three recommendations are likely to contradict our findings. When symptoms were used to predict a disease not directly associated with them in our simulations, the diagnostic accuracy of the symptoms for the associated disease partly depended on the correlations between the associated disease and the disease that directly caused symptoms (Table 11). When health-related symptoms are chosen based on health status and used to predict frailty, the correlations between health status and frailty should be well determined to understand their role in the diagnostic accuracy of the health-related symptoms for frailty. The first recommendation failed to recognize that the diagnostic accuracy of the health-related symptoms for frailty diagnosis depends on the correlations between health status and frailty and their interaction terms with baseline symptom incidence, risk ratios, and baseline symptom correlations.
The second and third recommendations require the symptoms to also be associated with age 7 . In addition to being caused by frailty in theory, the symptoms used to predict frailty are required to be associated with both health status and age. This approach creates a causal network that is difficult to simulate due to the large number of epidemiological measures involved, including the associations between age, health status, and frailty (3 parameters), how they interact with the baseline incidence and risk ratios of symptoms (3 X 2 parameters), and many others. This complexity is beyond what our simulations could handle and thus further evidence to justify these recommendations would be required. However, to our knowledge, no clear evidence exists to support the hypothesized casual network associated with these two recommendations.
The second and third recommendations also impose limits on the prevalence of the symptoms for frailty diagnosis 7 . The prevalence of frailty symptoms could not be too low because they need to increase with age according to the second recommendation 7 . Frailty symptoms could not be too common so that they would not saturate early 7 . In our simulations, overall symptom incidence failed to explain a large proportion of AUC variances, and was, in fact, negatively associated with diagnostic accuracy, AUCs. When baseline symptom incidence (among those not diseased only) can be estimated, it is negatively associated with the specificities of individual symptoms. We do not have sufficient evidence to support the recommendations to select frailty symptoms based on overall symptom prevalence.
Our findings partly address the fourth recommendation that encourages using symptoms from various human systems. Baseline symptom correlations (among those not diseased) are significantly and negatively associated with the maximal AUCs, when the at-risk incidence among those diseased reached 1 or not. This recommendation may make better sense, particularly when symptoms from various human systems are less correlated. In the simulations, overall symptom correlations that are observable are significantly and positively associated with AUCs, though slightly. It is unclear whether the ranges of correlations that the recommendation authors aimed to suggest and this recommendation can be improved based on our findings.
The additional recommendation that encourages using more symptoms (at least 30) for disease diagnosis is not supported by any evidence 7 . Our simulations show that diagnostic accuracy measured with AUCs often reaches a plateau at 30 or fewer symptoms. Moreover, the frailty indices produced by the authors of the recommendations being discussed have been criticized for using an excessive number of symptoms 1 . Their frailty indices seem overcomplicated and can be simplified with fewer symptoms, because many of the input symptoms are correlated 1 .
Implications for the use of diagnostic criteria. Currently the diagnosis of many conditions, such as mental illnesses 2,30 and frailty indices 1,9 , are based on composite diagnostic criteria. Both mental illnesses and frailty indices use symptoms to confirm diagnoses 1,2 . However, recently several issues related to composite diagnostic criteria have been identified. The most important issue is that complicated diagnostic criteria introduce biases into the diagnoses 1,31 . The input symptoms often are summed and censored with certain thresholds to derive intermediate variables or confirm diagnoses 1 . When the numbers or sum of symptoms are censored, biases that are not explained by the input symptoms can be generated and introduced to the diagnoses 1 . Therefore, the diagnoses of frailty have poor relationships with the input symptoms and do not predict major outcomes better than their input symptoms 1 . When tested in trials, the use of the diagnoses of poor interpretability, such as frailty, is associated with early termination of trials 32 .
Based on the findings in the present study, several approaches can be used to improve current diagnostic strategies. First, under certain circumstances, single symptoms may achieve high sensitivity or specificity. To www.nature.com/scientificreports/ effectively detect the disease, single symptoms need to be rare among those not diseased (a low baseline incidence and thus a high specificity) and have high risk ratios of development due to the disease cause (high sensitivity). However, the baseline incidence and risk ratios of the symptoms used to diagnose several conditions, such as frailty 1 or mental illnesses 2 , have not been well demonstrated. Second, symptoms should be selected based on evidence, at least on the understanding of possible causes of the symptoms, estimated risk ratios, baseline symptom incidence, and baseline symptom correlations. We noticed that when the risk ratios were similar to 1, the maximal AUCs were around 0.5, and so the AUCs provided little diagnostic values. When the risk ratios were less than 1, suggesting that the presenting symptoms were less likely to be related to the disease, the AUCs were likely to be less than 0.5. When the risk ratios were greater than 1, the AUCs tended to exceed 0.5. When using less than 30 symptoms for disease diagnosis, the AUCs can often reach plateau levels. Epidemiological measures have different impacts on the sensitivities and specificities obtained from the maximal or minimal AUCs using at most 40 symptoms, and assuming risk ratios greater or less than 1, respectively.
Third, when the relationships between symptoms have been well explored, using the number of symptoms for disease diagnosis can effectively minimize the biases introduced by data censoring 1 . The biases induced by data censoring or categorization can lead to a diagnosis, of which more than 70% of its variances can be explained by biases alone 1 .
Fourth, using more symptoms for diagnosis increases complexity. In the present study, when we used more symptoms for diagnosis, we found that their diagnostic accuracy could be improved according to AUCs. However, selecting single symptoms with a high diagnostic accuracy is much preferred because using multiple symptoms requires complex design, depends on well-tested thresholds, and needs to be justified with extensive research on these symptoms and their interactions.
Fifth, baseline symptom correlations are associated with the diagnostic accuracy (AUC) plateau that the symptoms can reach. The differences in the diagnostic accuracy of single symptoms and multiple symptoms are larger when the baseline symptom correlations are closer to 0. It is highly recommended that diagnoses consider the correlations between the symptoms among those diseased or not. Last, in the real world, when the disease cause remains to be investigated, it is not likely to achieve a perfect estimate of baseline symptom incidence or risk ratios, or to confirm baseline symptom correlations among those not diseased. In our simulations, overall symptom correlations, overall symptom incidence, and numbers of symptoms were observable and can be easily obtained. If the risk ratios cannot be estimated at all, symptom correlations and numbers of symptoms are positively and significantly associated with the AUCs for disease diagnosis. The overall symptom incidence is negatively and significantly associated with the AUCs. The three observable measures only explain a small fraction of the variances of the AUCs for disease diagnosis (adjusted R-squared = 0.03). When researchers are confident that these symptoms are more likely to occur among those diseased (RR > 1), these three measures remain significant, although the fraction of the variances of the AUCs for disease diagnosis further decreases (adjusted R-squared = 0.02).

Future research directions.
Several directions are open for future research. First, continuous variables can be used for disease diagnosis, which will require the development of complicated mathematical equations and add complexity to simulation and modeling. We will use the number of symptoms as the template for continuous-variable simulations. Second, often, more than one disease can cause the same symptoms, which adds quite a few interaction terms to the epidemiological measures. When established, these models will provide valuable examples to real-world studies. Third, models that build on incremental improvement will be necessary. It is computationally impossible to implement all models to demonstrate the diagnostic accuracy of the symptoms that occurred based on the epidemiological measures of all possible values. However, it is relatively feasible to construct simulations that conform to well-studied association networks 33,34 and epidemiological measures reasonably estimated with real-world data. Simulations can be used to support the findings from real world data, and may provide lessons for causal inference. In future studies, we will implement more complicated simulations and explore the usefulness of simulations for causal inference.
Lastly, situations exist that involve more complicated diagnostic approaches, for example clinical case definitions used in outbreak investigations 35,36 . Case definitions may be applicable to patients experiencing symptoms or signs in certain times or places, depending on the diseases of interest 37 . For example, a clinical malaria case can be defined based on the presence of the pathogen in the blood and the occurrence of related symptoms within 2 days of examination 38 . These case definitions can be modified to suit outbreak investigations and settings 39 . Our findings help to demonstrate the key epidemiological parameters that researchers need to pay attention to when they aim to update case definitions. In an outbreak investigation, the information on these epidemiological measures should be systematically collected. We think it possible to improve case definitions using updated information on these measures. This finding needs to be studied further in the future.
Limitations. Our simulation study depended on various assumptions: one disease causing multiple symptoms, similar symptom incidence, similar risk ratios causing symptoms, and similar correlations between symptoms among those not diseased. A related disease was set up to occur in association with the symptom-causing disease. This related disease remains insignificant in the symptoms' diagnostic accuracy for disease diagnosis (AUCs, sensitivities, and specificities). However, the simulations are not likely to match the complex multi-cause examples commonly seen in the real world. For example, the symptoms of frailty, a geriatric syndrome, can be linked to frailty and many other causes 1,6  www.nature.com/scientificreports/ due to the random assignment to different simulated populations. These variations may lead to slight differences in the simulation results.

Conclusion
Assuming symptoms are caused by a single disease, they occur based on four epidemiological measures: proportions diseased, baseline symptom incidence, risk ratios, and baseline symptom correlations. The symptom incidence among those diseased, at-risk incidence, can reach a maximum of 1. The sensitivities and specificities of single symptoms for disease diagnosis can be fully predicted by at-risk incidence and 1 minus baseline incidence, respectively. When the disease causes multiple symptoms based on similar epidemiological measures, these symptoms can be used for disease diagnosis. Using two symptoms for disease diagnosis-for example, the sensitivities and specificities of having 0, 1, or 2 symptoms-can be calculated to draw a ROC and derive its AUC. When repeating the same procedures using 1 to 40 symptoms for disease diagnosis, the maximal AUCs can be obtained, and the best sets of sensitivities and specificities can be selected from them. The above-mentioned epidemiological measures can explain large fractions of the maximal AUCs and the best sets of sensitivities and specificities. These findings are important for researchers who want to assess composite diagnostic criteria that are subject to biases and lack an evidence base. For example, the recommendations on constructing a frailty index have been widely used 7 . However, these recommendations neglect the role of these epidemiological measures and focus on observable measures (overall symptom incidence and numbers of symptoms) that do not well explain symptom diagnostic accuracy.